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Abstract 

We present non equilibrium molecular dynamics experiments of the unfolding and refolding of 
an alanine decapeptide in vacuo subject to a Nose thermostat. Forward (unfolding) and reverse 
(refolding) work distribution are numerically calculated for various duration times of the non equi- 
librium experiments. Crooks theorem is accurately verified for all non equilibrium regimes and 
the time asymmetry of the process is measured using the recently proposed Jensen-Shannon diver- 
gence [E.H. Fend, G. Crooks Phys. Rev. Lett, 101, 090602] . Results on the alanine decapeptide 
are found similar to recent experimental data on m-RNA molecule, thus evidencing the universal 
character of the Jensen-Shannon divergence. The patent non markovianity of the process is ra- 
tionalized by assuming that the observed forward and reverse distributions can be each described 
by a combination of two normal distributions satisfying the Crooks theorem, representative of two 
mutually exclusive linear events. Such bimodal approach reproduce with surprising accuracy the 
observed non Markovian work distributions. 
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INTRODUCTION 

Some time ago Crooks [l| derived, in the context of Monte Carlo simulations, an exact 
formula involving the dissipative work of a system driven out of equilibrium through a time 
dependent external potential and in contact with a thermal bath at temperature T = l/ksP- 
This formula, ever since known as the Crooks theorem (CT), reads: 

P(x,A) 



J{W-AF) 



(1) 



P(x,A) 

where P{x, A), P{x, A) are the probabilities of observing a forward trajectory x , giving 
the time schedule (or protocol) A, and of observing its conjugate trajectory x with inverted 
transformation protocol A, respectively; AF = Fb — Fa is the free energy difference between 
the initial and final canonical ensembles and W is the work done in the forward driven non 
equilibrium experiment. The Crooks formula has been later recognized of much broader 
validity, and it was shown to hold for deterministic systems in the context classical molecular 



dynamics simulations [2| 
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5 1 , Langevin dynamics 



7|, quantum systems [8|,l9| and verified 



12l | experiments. 



The essential points for Eq. [2] to hold is that the driven forward and reverse experiments 
ought to be started from equilibrium distributions and that the transformation protocols of 
the forward and reverse process (that can involve mechanical and thermodynamic variables [sl 
as well) are one the time reversal of the other. As the work done in the non equilibrium 
trajectory inverts sign by time reversal, the trajectories and their time-reversal counterpart 
can be labeled using the work such that Eq. [1] can thus be also written as 

P{W\F) 



^f3{W-AF) 



(2) 



P{-W\R) 

where P{W\F), P{—W\R) are the probability of observing a work W in the forward and re- 
verse experiment. Eq. [2]says that trajectories that are highly dissipative (i.e. W—AF » 0) 
in the forward sense are difficult to observe in the reverse sense since for such trajectories 
the dissipation of its time-reversal counterpart would be negative, thus transiently violat- 
ing the second law. In the functional form of Eq. [21 the Crooks theorems applies, with 
some provisions [3] related to the form of the external driving agent, to the controlled 
mechanical manipulation of a single molecule through optical tweezers [lOl] or atomic force 
microscopy, [l^ We conclude this introductory remarks by stating that Eq. [2l one of the 



very few exact equations in non equilibrium thermodynamics, holds for any regime: for in- 
stantaneous pulling we have that W = Hb — Ha and, by averaging over all trajectories, one 
recovers the Zwanzigjisl formula < e-/^(^s~^4) >^= e^^^ . For infinitely slow pulling, i.e for 
quasi-static reversible transformations, W = AF and the forward and backward distribution 
are indistinguishable and P{W\F) = P{-W\R) = 6{W - AF). 

Recently there has been considerable progress in the interpretation of non equilibrium 
experiments coming both from measurements on single molecules using AFM or optical 
traps 16| and from deterministic or stochastic simulations. 171] Feng and Crooks proposed 



to use the Jensen- Shannon divergence (l8l. Il9[| (JSD) between the probability of a trajectory 
and its time-reversal conjugate as a definition and a measure of the time asymmetry in a 
thermodynamics system. If we use the work W (which changes sign by time reversal) as a 
label for trajectories, then JSD can be written in terms of work distributions as 

2J ^ ' ^ P{W\F) + P{-W\R) 
JSD can be shown [igI to be equal to the average gain of information about the orientation 
of time's arrow from one single realization of the experiment. This quantity, plotted against 
the average dissipation obtained in the forward and reverse driven experiments, goes to zero 
for reversible processes, and to one full nats of information In 2 (i.e. 1 bit) when the two 
distributions do not overlap (i.e. for large average dissipation). In this latter case, it is easy 
to assign an observed trajectory (taken from the pool of forward and reverse non equilibrium 
experiments) to one of two distributions, or, stated in other words, it is easy to guess, from 
the analysis of one single random trajectory, in which direction the time is flowing. On such 
basis, when plotted against the average mean dissipation, JSD may then give indication on 
the energetic cost (i.e. the dissipation needed) to ensure that a molecular process (e.g. a 
molecular motor) advances in time. For Markovian (linear) systems, the work distributions 



are always Gaussianjsl, with variance twice the average dissipation. In this case, JSD vs 
dissipation is analytic and identical for all Markovian system. Therefore, Eq. [3] can also be 
used a measure of the non linearity of the system. 

In the context of non equilibrium thermodynamics, similar concepts were put forward 



recently by Kawai, Parrando and Van den Broeck. 17| For system perturbed far from equi 



librium through driven forward and time reversal protocols, they derived a remarkable exact 



formula connecting the relative entropy of the two conjugate phase space density of system 
measured at the same but otherwise arbitrary point in time to the average dissipation in 
the forward experiment. Exploiting the fact that the system is deterministic and that the 
work inverts its sign by time reversal, and labeling each phase space point in terms of (fu- 
ture) works, the Kawai-Parrondo-Van der Broeck formula can be straightforwardly written 
in terms of work distribution alone as 

<iy>-AF = kBTD{{P{W\F)\P{-W\R)) 

= k,Tl,WP(W\F)lo,[-f^) (3) 

Eq. [3] can be easily derived form the Crooks theorem Eq. [2l The integral on the Ihs is the 
KuUback-Leibler divergence (KLD),[l^ a strictly positive quantity measuring, in informa- 
tion theory, the expected extra message-length per datum that must be communicated if a 
code that is optimal for a given (wrong) distribution P{—W\R) is used, compared to using a 
code based on the (true) distribution P{W\F). In general the KLD is not symmetric, i.e. if 
q,p are two non identical distributions, D{p\q) 7^ D{q\p). For Markovian systems, however, 
the KLD is always symmetric. Moreover, for such systems, ksT times the KL divergence 
can be calculated analytically yielding the dissipation j3a'^/2, with o"^ being the variance 
of the Gaussian distribution. KL between forward and reverse distributions has the same 
characteristics of the JSD divergence, being the former like the latter both a measure of the 
time asymmetry (i.e. of the possibility for distinguish in which sense the time is flowing ) 
and of non linearity. However KL, as suggested by Kawai et al, could be effectively used 
as a tool for obtaining a better upper bound of the free energy than the average work W. 
This is so since, according to the chain rule, [3] the relative entropy (or KL divergence) 
decreases upon coarse graining. An extremely simple coarse graining scheme could be that 
of approximating coarse grained histograms of the forward and backward work distribution 
with the best linear model satisfying the Crooks theorem. This approach has been advocated 



recently by Forney et al. 



201 ] in the context of steered molecular dynamics of decaalanine 



vacuo along the end-to-end distance. These authors, in their so-called FR method |20l. |21| . 
produce a coarse grained histogram with few work measurements in both directions that are 
then fitted using a linear (Markovian) model. However, when the driven coordinates exhibit 
clear non linear effects (i.e. the noise due to all other "solvent" coordinates is not white 
or Gaussian) , as is the case of folding and refolding of small proteins along the end-to-end 



distance, then other less simphstic coarse grain schemes could and should be adopted. 



In this paper we 
introduced in Refs. 



'urther develop the concepts of time asymmetry and coarse graining 



16 



171 by presenting extensive non equilibrium molecular dynamics 
simulation data of unfolding and refolding process of decaalanine in vacuo performed with 
the deterministic Nose-Hoover thermostat at 300 K. In spite of the fact that decaalanine 
in vacuo has been extensively studied in the recent past by non equilibrium computational 
techniques js], ll], [2^, the rationalization and interpretation of the observed data is still a 
matter of debate, a-helix formation is also important per se and as a paradigm for an 
elementary folding/unfolding process. 

Our results on decaalanine are interpreted by means of the JSD and KLD quantities above 
introduced. We further present a simple coarse grain and totally general model satisfying 
the CT based on the assumption of the occurrence, in the refolding process, of two mutually 
exclusive events. Such simple coarse grain dual model explains many features of the observed 
work distributions and can be rationalized with the existence of two competing minima for 
low values RC in decaalanine, i.e. one of enthalpic nature (the helix), easily accessible, in the 
refolding process, at low dissipation regimes, and the other of entropic origin corresponding 
to a manifold or misfolded coil structures which emerges at large dissipation when trying to 
rapidly refold decaalanine from extended structures. This view appears to be quite general 
and is fully consistent with the rugged funnel picture of the folding process, in the sense that 
escaping the rugged funnel from below is a much tamer process than reentering the funnel 
from above. 

The present paper is organized as follows. Sec. II is dedicated to the description of the 
systems and of the methods used in the non equilibrium simulations. In sec. Ill we present 
the computer experiment results of the unfolding/refolding process of a single molecule of 
decaalanine along with a discussion focusing on the thermodynamic and microscopic aspects 
and on their rationalization in terms of a coarse grain description of a systems of general 
validity in the protein space. Conclusive remarks and futures perspective regarding the 
applicability of the presented methodology to real experiments are presented in Sec. IV. 
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METHODS 



In this section we provide the technical details on the steered molecular dynamics sim- 
ulations of the Alanine deca-peptide (Aio) in vacuo. The unperturbed system is described 



22. A con- 



with the all-atom force field CHARMM whose parameters are given in Ref. 
stant temperature of 300 K is imposed through a Nose- Hoover thermostat. {23] The resulting 
deterministic equations of motions are efficiently integrated using a reference system prop- 



agator alogorithm |2J] at three time steps, 3.0 fs for medium and long range non bonded 
interactions (no cut-off is imposed ), 1.5 fs for torsional potential involving hydrogen atoms 
and for short-ranged (14) non-bonded interactions, and 0.5 fs for stretching and bending 
potentials. The non equilibrium computer experiments from a folded (a-helical) to an ex- 



tended {all trans) structure (called /oraarc? process 



and viceversa are done according to the 



following scheme proposed by Park and Schulten|ll[|. The N atom of the N-terminus residue 
is constrained to a fixed position and attached to the N atom of the C-terminus, though a 
stiff harmonic spring (i.e. by adding a stretching potential to the unperturbed Hamiltonian) 
of adjustable equilibrium distance of the form 

v{t) = ^[c- mf (4) 

with ({t) being the time-adjustable equilibrium distance allowing the system to move along 
the ( (end-to-end distance ) coordinate. The driven unfolding (and refolding) of Aio along ( 
is bound to occur along the a— helix axis, by means of a bending constraint imposing the N 
atom of the N-terminus, the N atom of the C-terminus and a distant dummy atom at fixed 
position to all lie on the axis of the helix. The force constant k of the external potential 



ar 



5e value is used to 



ll| in the calculation 



used for guiding the processes (Eq. Hj) is 400 kcal mol~^ A~^. Such a 
minimize the possible negative impact of the stiff spring approximation 
of the free energy between the initial and final state. The conjugated time protocols A and 
A for the forward and reverse non equilibrium experiments are defined by setting in Eq. H] 
the corresponding time dependent equilibrium distances z{t) and z{t) as 

4t) = Ci + {Cf-Q)- = Q + v{r)t 

T 

zit) = Cf + {Q - Cf)l = Cf - v{T)t (5) 

where Q and (f are the initial and final values of the reaction coordinate, r is the total 
(simulation) time of the non equilibrium experiment and v{t) = ±{(f—(i)/T is the (constant) 
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pulling speed. In the present study, according to previous works[3|, [llj, we set Q = 15.5 
A and C/ = 31.5 A. The sampling at fixed value of ( is achieved again by using the potential 
of Eq. IHwith ({t) = Q = 15.5 A for constraining the system at the end-to-end distance of the 
a-helix and with = = 31.5 A for constraining the system at the end-to-end distance 
of the all trans extended structure. In this manner, trough ordinary equilibrium simulations 
in the canonical ensemble, we sampled, by saving the configuration at regular intervals of 
2 ps, 504 initial phase-space points for the a-helix state and 504 initial phase-space point 
for the extended state. Starting form these points, we then did forward and reverse non 
equilibrium molecular dynamics experiments applying the time dependent potential of Eq. 
m for various time protocols (i.e. at various pulling constant speed, corresponding to the 
duration r values of ranging from 0.021 to 4.2 ns). In particular, for each time protocol we 
did 504 forward and 504 reverse non equilibrium experiments for a total simulation time of 
11.15 fis . All non equilibrium simulations were done in parallel on a 32 node Intel CPU 



25|. The work done on 



X9650 cluster using an in-house parallel version of the program orac 
Aio in each of the non equilibrium experiments is calculated through 

W= r K{C-at))v{T)dt (6) 

JO 

RESULTS AND DISCUSSION 

In Fig [1] we show the forward (unfolding) P{W\F) and backward (refolding) {P{—W\R) 
work distribution obtained with the various time protocols by means of the computational 
methods described in the previous section. As expected, the two conjugated work distribu- 
tions approach to each other the longer the duration of the non equilibrium experiment, i.e. 
the more reversibly is done the transformation. The conjugated work distributions appear 
to meet approximately at the same value oi W = AF no matter what time protocol is 
used, in full agreement with the Crooks theorem [2l The free energy difference AF between 
the helix {( = 15. 5A) and the extended {( = 31.5 A) structure can be estimated with 



rather good accuracy using the Bennett acceptance ratio [26|, |27j formula already starting 
from r = 0.105 ns, where the two work distribution overlap significantly. Using the Bennett 
formula, we consistently obtain values between 92 and 94 kJ/mol, with an average value of 
AF = 93.3 ± 0.5k J /mol'^ in full agreement with previous estimate of the unfolding free 
energy of decaalaninej^, [ll|. Detailed data regarding AF, dissipated work and variance of 



the distributions are reported in Table [B From Inspection of a Figure [T] and from the data 
of Table [I we see that, while the forward distribution P{W\F) preserves an approximately 
Gaussian shape for all time protocols, the reverse distribution show a markedly non Gaussian 
shape at all times. In particular, the reverse distributions are characterized by a long tail 
that, for r < 0.1 ns and r > 0.2 ns, lies on the right and of the left of the maximum of the 
distribution, respectively. As we shall see later in the discussion, this peculiar behaviour of 
the non equilibrium refolding of Aio is a signature of competitive mutually exclusive events, 
i.e. the formation of the a- helix (for W > 60 kJ mol~^, i.e. at low dissipation) form one 
hand and the evolution towards misfolded structures (for W < 25 kJ mol~^, i.e. at high 
dissipation) form the other hand. 

The asymmetry in the behaviour of P{W\F) and P{W\R) distribution in Aiq is shown 
in Fig. [2] where we report the forward and reverse dissipation (see also Table I) against the 
duration time r of the non equilibrium forward and reverse experiments. The reverse process 
is consistently more dissipative than the forward for all duration time. Beyond 1 ns, the 
dissipation for the reverse and forward processes becomes identical and small compared to 
AF, indicating that the non equilibrium experiments are performed in conditions of quasi- 
reversibility. Such behaviour of the dissipation of the refolding process vs the duration time 
of the non equilibrium experiment could have been easily guessed directly form Fig. [1] by 
following the trends of the maxima of the distributions as function of the duration time 
of the experiments. The "transition time" between reversible and quasi-ieversihle regimes 
(approximately falling between 0.8 and 1.5 ns) must be ultimately connected to either the 
non equilibrium time protocol or to some inherent structural property (e.g. potential of 
mean force along () and dynamical property (e.g. friction and diffusion coefficients along () 
of the system under investigation or, again, to both. 

In order to assess the time-asymmetry and non linearity of Aio, the data are used to 
compute the Jensen- Shannon divergence as a function of the average dissipation V = 
W > f ~\- < W >r) in kbT units. To this aim we use directly Eq. [3]which can be applied to the 
work data without any a prior knowledge of AF. The results (triangle symbol) are reported 
in Fig. [3l The solid line correspond to the universal Markov model where P{W\F) and 
P{—W\R) are normal distributions with equal variance and with variance and dissipation 

2 

related hj < W > —AF = j3^. Expectedly, the JSD follows closely the Markov model for 
average dissipation below 4 ksT, i.e. when the process is quasi-reversible and above 13 ksT 
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i.e. when the two distributions have neghgible overlap and the time asymmetry approaches 
is hmiting values of In 2 nats. Remarkably, this limiting value of the JSD is reached at an 
average dissipation that is close to the corresponding dissipation that can be extrapolated 
from the experimental data on the unfolding/refolding of RNA molecule reported recently 
by Feng and Crooks (see Fig. 2 of Ref. Il6l ). This sugg ests a universal behaviour of the 
JSD in real systems, thereby strengthening the idea 16|] that the encoding cost to ensure 
that a molecular process advances in time amounts to few ksT, being rather insensitive to 
specificity of the molecular process itself. 

In spite of the non linearity (see Table [land Fig. [2]) of Aio along (, especially evident at 
intermediate dissipation regimes (i.e for 0.1 < r < 0.5 ns ), the Jensen-Shannon divergence 
vs dissipation appears quite insensitive to such non linearity being nearly indistinguishable 
from the universal Markov Jensen- Shannon divergence (solid line) for all dissipation energies. 
We must stress here that the error in the Jensen- Shannon divergence vs dissipation is small 
(with error bars of the order of the height of the triangle symbols in Fig. [3]), reflecting the 
small error in the determination of AF itself. In order to show this more quantitatively, we 



16 



which requires a prior 



have also calculated the JSD using the alternative Eq. 7 of Ref. 
knowledge of AF (through, e.g., the Bennett's method). As one can see form Fig. [3l the 
JSD calculated with this method (circle symbol) follows closely that of the direct method 
Eq. [3l The insensitivity of JSD vs the mean dissipation to the non linearity of the system 
is probably due to the fact that the JSD itself is a symmetries average of the two KuUback 
divergence between the forward and reverse distribution with respect to the average of the 
two distributions. 

The non linearity of the folding/unfolding process of Aio for r < 1 ns does not allows 
the use of a the simple Markov approach to satisfactorily reproduce, in this time range, the 
observed distribution and at the same time satisfy the CT. As an example of such inability, 
in Fig. IHwe show the best Markov model fitting the data for r = 0.105, r = 0.21 ns and at 
the same time satisfying the Crooks theorem with AF = 93.3 KJ mol~^. The inadequacy 
of the Markov model is not surprising since the driven end-to-end distance is not a "good" 
coordinate, i.e. the modulations of remaining ("solvent") coordinates on ( do not produce 
a white noise. The memory effects in ( (see e.g Fig [2]) indicate that there must be some 
other important orthogonal coordinate besides ( that should be included in the model. We 
stress here that the pure Markov model is a coarse graining of the information regarding 
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the microscopic detail of process in the sense that one attempts to describes the the full 
information given by the experimental histograms of the forward and reverse work with two 
CT related Gaussian. This elementary coarse-graining is the so-called FR model 21 1. 

In an effort to go beyond the simple Markov model or FR model, following Feng 
and Crooks 161], we now assume that the forward and reverse true distributions P(W\F), 
P{~W\R) can be approximated by the distributions V(W\F), V{—W\R), each given by a 
linear combination of two normal distributions, i.e. 

V{W\F) = pMiwi.ai) + [l - p)M{w2,a2) 
V{-W\R) = qAf{w, - (3al a,) + (1 - q)Ar{w2 - f3al a^) (7) 

where Af{w, a) is a normal distribution with mean w and variance a and < p < 1. The form 
of V{—W\R) is a trivial consequence of the CT, Eq. [2l Eq. [7] implies that the forward non 
equilibrium process is described by two mutually exclusive events occurring with probability 
p and (1 — p) in the forward process and q and 1 — g in the reverse process with mean 
dissipation satisfying the Crooks theorem. We stress here that also such bimodal scheme is 
a coarse graining of the full available microscopic information provided by the experimental 
histogram P(l^)'s. The model can be of course complicated by combining an arbitrary 
number of normal distributions allowing for many competing events. However, we shall see 
in the forthcoming discussion that the simple bimodal scheme, Eq. [71 captures the essential 
features of the "experimental" distributions based on the full microscopic information. 

Going back to Eq. [3, the probabilities p and q are not free parameters as the the CT and 
the normalization condition set a twofold constraint on the coefficients of the combinations 
In fact, by using the Crook theorem, Eq. [21 in Eq. [7], the condition of normalization on the 
probability densities P(W\F) and P{—W\R) requires that 

1 _ g/3(AF-«;2 + 5/3<T2) 
^ " g/3(AF-«;i + 1/3^2) _ g/3(AF-«;2+i/3a2) 

(1 _ g/3(AF-t«2+|/3<72)-jg/3(AF-«;i + i/3af) 
^ ~ g/3(AF-«;i + i/3(72) _ ^p{AF-W2+^Pa^) 

Since p and q are probabilities, not all the values of the free parameters wi, ai,W2, (J^ are 
allowed. We now define the variables x = AF — wi + \[3a1 and y = AF — W2 + |/9o"2- In Fig- 
[5]we plot the functions p{x, y) and y) on the domain of the variables x = AF — ifi + |/?crf 
and y = AF — W2 + for which < p << 1 and < g < 1. As it is well known, the 
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allowed values for the variance and the mean in a pure Markov model obeying the Crooks 
theorem stays on the line w = AF + /3cr^/2. Analogously, Figure [5] represents the two 
dimensional domain set by the CT theorem for a bimodal Markov model. When p = 1, 
then e/5(^^-"'i+5/5o"i) = i such that AF — wi = \I3(t'1 and q = 1, thus recovering the single 
Gaussian Markov model. 

We now adopt the model based on the coarse grain bimodal representation Eq. [71 in order 
to reproduce the true work distributions. In Table [TTl we report the parameters obtained 
from the fit using the bimodal distributions as a function of the duration time. The loss 
of information due to coarse graining with respect to the true (measured) distribution is 
measured by the KL divergence (Eq. [3]) between the mean of the true distributions P{W\F) + 
P{—W\R) and the mean of the (absolutely continuous) coarse grain distribution V{W\F) + 
V{—W\R). Large values of KL means great loss of information in the coarse graining. 

We see that the bimodal approach, Eq. [7] has consistently smaller KL's with respect to the 
purely Markov model (shown in lllll) at all times. A visual example of the surprising accuracy 
of Eq. [7|in reproducing the true distributions is shown in Fig. |5] where the true distributions 
and the bimodal distribution of Eq. [7] are compared for various short and intermediate 
duration times. By inspection of Table IIIH one can see that the relative probability of the 
two mutually exclusive event underlying the reverse distribution depends on the rate with 
which the non equilibrium experiment is done. At short duration times (r = 0.021 ns), the 
highly dissipative events in the refolding of Alanine decapeptide are overwhelmingly more 
likely than the non dissipative event, while most of the "refolding" trajectories produce a 
misfolded structure with Q = 15.5 A. When the rate of the non equilibrium experiment 
is slower (e.g. at r = 0.21 of r = 0.3 ns), then the two competing event (misfolding vs 
folding) becomes of comparable probability. Expectedly, in the unfolding process for all 
duration duration times, the dissipative event has consistently a much larger probability 
than the non dissipative event (see Table UTTl) . In fact, while on one hand the misfolding of 
Aio starting from an extended structure is a probable outcome in a fast refolding process, 
on the other hand in the non equilibrium unfolding process it is not so likely to disrupt the 
helix doing less work than the needed reversible work. Remarkably, the CT automatically 
balances these mutually dependent probabilities p and q through Eq. [91 

The results that we have presented show that a coarse grain scheme based on only two 
mutually exclusive linear event, yielding a work distribution that is a combination of two 
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Gaussian distributions with linear coefficients satisfying the Crooks theorem, explains the 
observed non linear work distributions at short and intermediate times with surprisingly 
good accuracy. The success of the bimodal approach in reproducing the essential features 
of the true distributions at short and intermediate duration times of the non equilibrium 
experiments allows to sketch out an elementary microscopic picture: in the forward direction 
only one path is possible and the process is approximately Markovian. In the reverse direc- 
tion (refolding), several other metastable minima at ^ = 15.5 A can be visited depending 
on the dissipation (i.e. on the duration time of the experiment). At very fast rate, virtu- 
ally no hydrogen bond has to time to form and a misfolded coil structure is systematically 
formed. At intermediate rates more paths are possible towards variously misfolded structure 
(included distorted helices) with a probability balance between these paths that depends on 
the the duration time (i.e on the mean dissipation): the slower the process, the smaller 
the dissipation, the larger is the fraction of refolding trajectories producing the helix. At 
duration time between 0.6-2 ns, the refolding process ends up mostly in the helical structure. 
For duration time beyond 3 ns, refolding is virtually non dissipative (reversible) and only 
the «-helix minimum is visited. The existence of the misfolded minima at ( = 15.5 A, that 
have a negligible probability at the canonical equilibrium, emerges in the refolding process 
in the fast pulling/large dissipation regime. The rare event in these dissipative regimes is 
the correct folding of decaalanine to the enthalpic minimum. As the regime becomes less 
and less dissipative (i.e more reversible), the rare event becomes the formation of misfolded 
coils. The dissipation in the refolding process, that can be simply modulated by varying the 
duration time of the non equilibrium experiment, may be thus a mean to "see" minima in 
the folded (or native) structure that are hard to detect at equilibrium. 



CONCLUSION 



In this paper we have studied the Alanine decapeptide in vacuo at 300 K and analyzed its 
behaviour in driven out of equilibrium classical molecular dynamics simulations. Applying 
an external potential, we produced classical trajectories starting form the a-helix structure 
and ending to a fully extended all trans structure and viceversa. The bidirectional non 
equilibrium experiments were done at various pulling rate, with duration time ranging from 
0.021 ps to 4.2 ns. For each bidirectional experiment at a given pulling rate we calculate the 
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forward and reverse work distribution and apply the Bennett acceptance ratio to estimate the 
free energy difference between the folded and unfolded state, thus evaluating the dissipative 
work spent during the non equilibrium processes. We found that the folding/refolding 
process is markedly non Markovian for duration time t < 1 1-1.5 ns and that in such pulling 
rate regime the reverse process consistently dissipates more than the forward counterpart. 
For duration time r > 2 ns, the system becomes reversible, exhibiting equal forward and 
reverse mean dissipation Wd = l3a'^/2 with the being the variance of two identical normal 
distributions. Using our non equilibrium trajectories of Aio and the corresponding work 
distributions, we have measured the Jensen-Shannon divergence as a function of the mean 
forward and reverse dissipation. This quantity is a convenient metric for the "irreversibility" 
of the system, i.e. for the ability, given a pulling regime yielding a given mean dissipation, to 
figure out in which direction time is flowing from one random realization of the experiment. 
Remarkably, the behaviour of the Jensen- Shannon divergence for the Alanine decapeptide 
in vacuo closely follows that observed in a recent single RNA molecule experiment . jiol] 
thereby strengthening the recently proposed idea 16[ that the encoding cost to ensure that 
a molecular process advances in time is independent of the system and amounts to 4-10 kbT. 
In the case of the Alanine decapeptide, which shows a strongly non linear behaviour, the 
Jensen- Shannon divergence plotted against the dissipation has been nonetheless found to 
approximately follow the JSD for a purely Markov model. Such surprising insensitivity of 
the JDS vs dissipation to non linearity is yet another confirmation of its universal character. 

The observed forward and reverse work distributions in Aio cannot be fitted satisfactorily 
for fast and intermediate pulling speed with normal distributions satisfying the Crooks 
theorem, thereby reflecting the fact that the process in such regimes is non Markovian 
(i.e. the end-to-end coordinate exhibits memory effects). Alanine decapeptide behaves 
linearly only for sufficiently slow pulling rates (r > 1 ns ). Following a suggestion by Feng 
and Crooks 161], we thus fitted the observed distribution using combination of two normal 
distributions. This approach implies that both the forward and the reverse process can be 
described by two rather than one solvent modulated processes, whose relative probability (i.e. 
the ratio of the linear coefficients of the combination) is regulated by the Crooks theorem and 
by the pulling rate of the non equilibrium experiment. We found that such a simple model 
can reproduce with surprising accuracy the observed distributions at short and intermediate 
pulling rate. At short rates the reverse distribution has an overwhelmingly large component 
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from the normal distribution with mean corresponding to large dissipation, with a negligible 
contribution arising from a rare non dissipative event corresponding to the refolding in the 
a-helix structure. For short and intermediate duration times, the "refolding" of Aiq has 
an high chance to fail producing a manifold of misfolded structures. As the duration time 
grows, the likelihood of the non dissipative process (i.e. the correct refolding) grows as well. 
The break-even point for the likelihood of two events in the reverse driven process occurs 
between duration times of 0.2 and 0.3 ns. The above results suggests a possible route in real 
experiments on single molecules using, e.g., an optical trap apparatus to detect metastable 
states. In fast pulling experiments, the extra energy implied in the large dissipation allows 
to visit states that are hard to visit in a driven quasi reversible experiment. In presence 
of two competing minima, one could then use the dual Markov model extrapolated from 
few bidirectional work measurements to both achieve, trough the KL divergence, and its 
connection to the dissipation, a better estimate of the free energy between the final and 
initial states and to identify secondary metastable minima at fixed driven coordinate that 
are difficult to evaluate either because of the presence of high barrier or because they are 
several KT larger than the principal (native) structure. 
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Tables 



r 


AF 


< Wf > 




< wf > 




0.021 


86.0 


83.2 


245.6 


82.8 


67.7 


0.042 


84.2 


60.4 


174.6 


72.9 


142.0 


0.063 


94.0 


40.6 


133.4 


75.2 


282.4 


0.084 


92.3 


34.7 


114.0 


65.3 


406.2 


0.105 


93.8 


28.6 


105.8 


59.5 


489.5 


0.150 


93.9 


22.8 


75.6 


46.6 


507.8 


0.210 


93.7 


18.6 


69.4 


35.6 


434.0 


0.300 


93.9 


14.1 


54.4 


25.6 


303.9 


0.420 


92.9 


12.0 


52.1 


18.3 


204.4 


0.630 


93.0 


8.7 


42.8 


11.9 


105.9 


0.840 


93.2 


7.1 


32.6 


9.2 


79.4 


0.930 


93.6 


6.3 


30.1 


8.4 


88.3 


1.050 


93.2 


5.8 


24.8 


7.9 


85.5 


2.100 


93.4 


3.2 


16.3 


3.9 


40.4 


4.200 


93.2 


1.7 


9.7 


2.3 


33.4 



TABLE I: Salient data of the work distributions in Alanine deca-peptide in vacuo at 300 K. For 
each duration time r, the forward and reverse work distributions have been calculated using 504 
trajectories. AF is the free energy difference between the final (all trans extended structure, 



C = 31.5 A) and the initial (a-helix structure, C = 15.5 A) using the Bennett acceptance ratio 



on the 1008 forward and reverse trajectories. < >, < Wf >, are the mean dissipated 
work and variance of the forward and reverse work distributions, respectively 
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Forward distribution Reverse distribution 





r 


KL 




p 


Wi 


0-1 


W2 


0-2 




Q 


W3 


0-3 


W4 


£74 


0. 


,021 


0, 


.38 


1 


.00 


167.38 314.83 


21.44 


47.49 


0, 


,99 


2.40 


47.49 


41, 


.15 


314.83 


0, 


,042 


0, 


.55 


1 


.00 


143.35 222.69 


31.04 


60.63 


0, 


,90 


6.73 


60.63 


54, 


.06 


222.69 


0. 


,063 


0, 


.48 


1 


.00 


134.87 187.48 


41.19 


72.37 


0. 


,82 


12.17 


72.37 


59, 


.69 


187.48 


0. 


,084 


0, 


.60 


1 


.00 


126.43 144.10 


99.73 


199.01 


0, 


,84 


19.94 


199.01 


68, 


.65 


144.10 


0. 


,105 


0, 


.43 


1 


.00 


122.30 126.78 


118.34 


231.15 


0, 


,79 


25.65 


231.15 


71, 


.47 


126.78 


0. 


,150 


0, 


.51 


0, 


.99 


115.49 


97.68 


131.40 


240.41 


0. 


,69 


35.00 


240.41 


76, 


.32 


97.68 


0. 


,210 


0, 


.42 


0, 


.98 


111.76 


82.83 


124.30 


199.57 


0. 


,59 


44.28 


199.57 


78, 


.55 


82.83 


0. 


,300 


0, 


.33 





.97 


107.78 


65.06 


115.62 


149.68 


0, 


,51 


55.60 


149.68 


81, 


.69 


65.06 


0, 


,420 


0, 


.41 





.97 


104.62 


52.95 


113.55 


133.20 


0, 


,65 


83.39 


52.95 


60, 


.14 


133.20 


0. 


,630 


0, 


.31 





.99 


102.49 


44.96 


106.00 


97.50 


0, 


,82 


84.46 


44.96 


66, 


.91 


97.50 


0, 


,840 


0, 


.37 


1 


.00 


102.07 


44.96 


113.08 


148.18 


0, 


,98 


84.04 


44.96 


53, 


.67 


148.18 


0, 


,930 


0, 


.27 


1 


.00 


100.17 


35.42 


99.68 


123.54 


0. 


,97 


85.97 


35.42 


50, 


.15 


123.54 


1. 


,050 


0, 


.31 


1 


.00 


99.40 


31.53 


102.81 


124.17 


0. 


,97 


86.76 


31.53 


53, 


.02 


124.17 


2, 


,100 


0, 


.27 


0, 


.99 


96.26 


16.09 


115.83 


123.66 


0. 


,98 


89.81 


16.09 


66, 


.25 


123.66 


4, 


,200 


0, 


.14 





.89 


94.56 


7.41 


96.67 


21.06 


0. 


,86 


91.59 


7.41 


88, 


.23 


21.06 



TABLE II: Best fit parameters for tlie true forward and reverse distributions, using a bimodal 
distribution Eq. [71 r is the duration time of the non equihbrium experiment in ns. KL is 
the Kullback-Leibler divergence (in kJ units) between the sum of the true forward and reverse 
distribution and the sum of the fitted bimodal forward and reverse distributions, = wi — Paf 
and u'4 = 102 — /^crl mean value of the normal distribution of the reverse process. For the 

meaning of the other symbol see text. Units of energy are kJ mol~^. 
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T 


KL 


w 


a 


0. 


,021 


2, 


.20 


177.6 


421.811 


0. 


,042 


5, 


.14 


176.6 


416.912 


0. 


,063 


6, 


.16 


172.2 


395.072 


0. 


,084 


7, 


.25 


130.5 


187.082 


0. 


,105 


7, 


.06 


125.9 


164.197 


0, 


,150 


7, 


.98 


117.7 


123.593 


0. 


,210 


5, 


.13 


114.1 


105.130 


0. 


,300 


3 


.23 


109.5 


81.816 


0. 


,420 


3 


.01 


105.6 


62.393 


0. 


,630 


1, 


.44 


102.2 


45.533 


0. 


,840 


1, 


.01 


100.3 


36.747 


0, 


,930 


1, 


.95 


98.9 


29.519 


1, 


,050 


1, 


.80 


99.4 


31.581 


2. 


,100 


1, 


.92 


96.1 


15.379 


4, 


,200 


0, 


.32 


94.7 


8.348 



TABLE III: Best fit parameters of the true forward and reverse distribution using the hnear model. 
KL(k,l mol^l) is the Kullback-Leibler divergence between the sum of the true forward and reverse 
distributions and sum of the fitted Gaussian forward and reverse distributions. 
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Figure Captions 



Fig. 1 Forward (on the right in brown) and backward (on the left in black) work distri- 
butions for Alanine decapeptide in vacuo at 300 K obtained in non equilibrium 
experiments of various duration time ranging, from bottom to top, from 0.021 s 
to 4.2 ns. Each forward and reverse distribution has been calculated using 504 
work measurements. 

Fig. 2 Mean dissipation vs the duration time of the non equilibrium experiments for the 
forward (unfolding) and reverse (refolding) of Alanine decapeptide vacuo. 

Fig. 3 Jensen- Shannon divergence vs the mean dissipation 0.5(< Wf > + < Wr >) in 
Alanine decapeptide vacuo at 300 K. The triangle symbols have been calculated 



according to Eq. [31 The circle have calculated using Eq. 7 of Ref. [ly by using 
a AF of 93.3 kJ mol"^. The solid line refers to a Gaussian (Markovian) model 
such that < Wd >= /3cr^/2 in both forward and reverse directions. 

Fig. 4 (a): Forward (circle and solid line) and reverse (triangle and dotted line) distri- 
bution for a duration time of r = 0.105 ns in Aiq in vacuo at 300 K compared 
with the best fit Markov model (solid and dashed thick lines ) satisfying Eq. [2l 

(b): Same as in a) except for a duration time of r = 0.210 ns. 

Fig. 5 Probabilities p and q (see Eq. ^ for a bimodal distribution satisfying the CT 
(Eq. ED as a function oi x = AF - wi + l(3aj and y = AF - W2 + \l3al {RT 
units) . 

Fig. 6 True and fitted forward and reverse work distributions in Alanine decapeptide 
in vacuo at 300 K for various duration times of the non equilibrium experiments 
using the bimodal approach, Eq. [71 The forward and reverse true distribution 
are in brown and black, respectively. The forward and reverse fitted distribution 
are in violet and blue, respectively. The parameters of the fit can be found in 
Table M 
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FIG. 5: 
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